Skip to main content

Health Checks

Health checks allow NGINX to determine whether a backend server is healthy or unhealthy and decide whether to send traffic to it.

Goal:

If backend is unhealthy → stop sending requests to it
If backend recovers → resume traffic

This ensures:

  • High availability
  • Fault tolerance
  • Better user experience

Types of Health Checks in NGINX

NGINX supports two kinds of health checks:

TypeAvailability
Passive health checks✅ NGINX Open Source
Active health checks❌ Open Source (✅ NGINX Plus)

Passive Health Checks (Open Source NGINX)

NGINX does not actively probe backends.

Instead, it:

  • Sends real client requests
  • Monitors backend responses
  • Marks a server as failed if errors occur

Failure conditions include:

  • Connection timeout
  • Connection refused
  • Invalid response
  • HTTP 500 / 502 / 503 / 504

Core Directives for Passive Health Checks

These are defined inside the upstream block.

max_fails

server backend1 max_fails=3;

Number of failed attempts before marking server unhealthy

fail_timeout

server backend1 fail_timeout=30s;
  • Time window for counting failures
  • Also the time server is considered down

Combined Example

upstream app_backend {
server 10.0.0.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.0.12:8080 max_fails=3 fail_timeout=30s;
}
  • If a server fails 3 times within 30 seconds
  • NGINX marks it unavailable
  • Traffic is sent only to healthy servers
  • After 30 seconds, NGINX retries it

Full Passive Health Check Example

upstream api_backend {
least_conn;

server 10.0.0.11:8080 max_fails=3 fail_timeout=20s;
server 10.0.0.12:8080 max_fails=3 fail_timeout=20s;
}

server {
listen 80;

location /api/ {
proxy_pass http://api_backend;

proxy_connect_timeout 3s;
proxy_read_timeout 10s;
}
}

Request Flow Explanation

  1. Client sends request to NGINX
  2. NGINX proxies request to backend
  3. If backend:
    • Times out
    • Refuses connection
    • Returns 5xx repeatedly
  4. NGINX increments failure counter
  5. Once threshold reached → backend is skipped
  6. After fail_timeout, backend is retried

What Happens When All Backends Fail?

If all servers in the upstream are marked down:

  • NGINX temporarily retries failed servers
  • If still unavailable → client receives 502 Bad Gateway

Passive Health Check Failure Conditions (Important)

NGINX counts failures when:

ConditionCounts as Failure
TCP connection refused
Timeout
No response
HTTP 500 / 502 / 503 / 504
HTTP 404
HTTP 401

backup Servers (Failover Strategy)

upstream app_backend {
server 10.0.0.11:8080;
server 10.0.0.12:8080;
server 10.0.0.99:8080 backup;
}
  • Backup server is used only if all primary servers fail
  • Useful for DR or reduced-capacity nodes

Temporarily Disable a Backend

   server 10.0.0.11:8080 down;
  • Server is manually removed
  • Useful for maintenance
  • Requires reload to re-enable

Active Health Checks (NGINX Plus Only)

Not available in open-source NGINX

How Active Health Checks Work

  • NGINX sends periodic health probe requests
  • Uses a dedicated endpoint (e.g. /health)
  • Removes backend before user traffic fails
health_check uri=/health interval=5s fails=2 passes=1;

Passive vs Active Health Checks

FeaturePassiveActive
Available in OSS
Sends probes
Detects failures early
Uses real traffic
ComplexityLowMedium

Real-World Production Example

upstream web_backend {
least_conn;

server 10.0.1.10:8080 max_fails=2 fail_timeout=15s;
server 10.0.1.11:8080 max_fails=2 fail_timeout=15s;
server 10.0.1.99:8080 backup;

}

server {
listen 80;

location / {
proxy_pass http://web_backend;
proxy_connect_timeout 2s;
proxy_read_timeout 30s;
}

}

Summary

  • Health checks prevent traffic to failed backends
  • Open-source NGINX supports passive health checks
  • Key directives:
    • max_fails
    • fail_timeout
    • backup
  • Active health checks require NGINX Plus
  • Proper tuning is critical for reliability